Scientific Python antipatterns advent calendar day eighteen

For today, a very common “defensive programming” habit that often backfires: catching exceptions too broadly (or worse, swallowing them). As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Catching exceptions too broadly

As soon as we start processing real data in our programs, we will run into data points that cause a problem. Here’s a tiny example, given a list of pairs of numbers:

data = [
    (1,2),
    (3,0),
    (5,4)
]

we want to calculate, for each, the ratio, i.e. the first number divided by the second. As long as we remember how to iterate over lists of tuples from day seven, the code is straightforward:

for x,y in data:
    ratio = x / y
    print(ratio)
0.5
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[3], line 2
      1 for x,y in data:
----> 2     ratio = x / y
      3     print(ratio)

ZeroDivisionError: division by zero

but as we see, we run into problems when the second number is zero - programming languages do not like dividing by zero, so we get an error.

In real data this would be very annoying; imagine that instead of three data points we have millions. We don’t want a single “bad” data point to prevent our program running on the rest.

A tempting pattern is to wrap the code in a try block and tell Python to ignore any errors:

for x,y in data:
    try:
        ratio = x / y
        print(ratio)
    except:
        pass
0.5
1.25

This works, and our program can now finish running on the rest of the data. But we are setting ourselves up for problems later on.

Firstly, by silently dropping bad data points we lose any ability to diagnose problems with the data. There’s no way to tell from the output how many data points were skipped. So at a minimum it’s a good idea to log them:

for x,y in data:
    try:
        ratio = x / y
        print(ratio)
    except:
        print('something went wrong')
0.5
something went wrong
1.25

For larger datasets, which might produce a lot of output, we will probably want to write these messages to a log file. It might also be useful to store the bad data points and summarise them at the end:

skipped = []

for x,y in data:
    try:
        ratio = x / y
        print(ratio)
    except:
        skipped.append((x,y))

print(f'skipped {len(skipped)} bad data point(s):')
print(skipped)
0.5
1.25
skipped 1 bad data point(s):
[(3, 0)]

The more serious problem with our code, though, is that it is capturing all possible errors and treating them in the same way. Let’s try to add a filter and only print the pairs where the ratio is greater than 0.9:

for x,y in data:
    try:
        ratio = x / y
        if ration > 0.9:
            print(x, y)
    except:
        pass

This code produces no output, so we might assume that there are no data points with a ratio greater than 0.9. But looking at the previous output, we can easily see that there are. In fact, we have made a typo in the code and typed ration rather than ratio, but the error message that would usually accompany this typo gets hidden by our except block. This will make it very difficult to find and fix the bug!

The better approach to error handling is to be more specific about what type of error we want to skip. Simply adding an exception type to our code improves things a great deal:

for x,y in data:
    try:
        ratio = x / y
        print(ratio)
    except ZeroDivisionError:
        pass
0.5
1.25

Now we have code that only skips errors caused by dividing by zero, while leaving other error messages unchanged:

for x,y in data:
    try:
        ratio = x / y
        if ration > 0.9:
            print(x, y)
    except ZeroDivisionError:
        pass
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 4
      2 try:
      3     ratio = x / y
----> 4     if ration > 0.9:
      5         print(x, y)
      6 except ZeroDivisionError:

NameError: name 'ration' is not defined

Of course, we can combine this with our error-logging code to end up with a program that is more robust and maintainable than the one we started with, especially if we modify the summary to mention exactly why some points were skipped:

skipped = []

for x,y in data:
    try:
        ratio = x / y
        if ratio > 0.9:
            print(x, y)
    except ZeroDivisionError:
        skipped.append((x,y))

print(f'skipped {len(skipped)} data point(s): with zero y:')
print(skipped)
5 4
skipped 1 data point(s): with zero y:
[(3, 0)]

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list